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Abstract 


To meet the requirements of the EU Water Framework Directive, models are useful to predict communities in watercourses 
based on the abiotic characteristics of their aquatic environment. For that purpose back-propagation Artificial Neural Network 
(ANN) algorithms were used to induce predictive models on a dataset of the Zwalm river basin (Flanders, Belgium). This 
dataset consisted of 120 samples, collected over a 2-year period. Fifteen environmental variables were measured at each site, 
as well as the abundance of the aquatic macroinvertebrate taxa. Different neural networks were developed and optimized to 
obtain the best model configuration for the prediction of the habitat suitability of macroinvertebrate taxa. The best performing 
number of hidden layers and neurons and training algorithms have been searched for. The different options were theoretically 
and practically validated and assessed. The theoretical validation was based on cross-validation. For the practical validation, 
potential applications of the neural network models were analyzed, and the predictive performance of the models was assessed 
using ecological expert knowledge. The results indicate that the number of times a taxon was found in the whole river basin 
influences the performance measures and the architecture of the network. Based on the Cohen’s kappa, it could be concluded 
that ANN models predicting the presence/absence of very rare taxa (e.g. Aplexa) or very common taxa (e.g. Tubificidae) were 
rather irrelevant, although their correctly classified instances (CCI) was high. Predicting the presence/absence of Asellidae (a 
moderately present taxon), the highest performances (CCI and Cohen’s kappa) were found for the network model with two 
hidden layers each having 10 neurons. When calculation time was also taken into account, the network model with one hidden 
layer having 10 neurons could be preferred. Applying this network architecture, performances were only slightly worse, while 
calculation time was a lot shorter. One may also conclude that not all network models resulted in a relevant relation between a 
variable and a specific taxon. For Gammaridae for example, a rather small ANN structure gave a better idea of the impact of 
dissolved oxygen on its presence than a larger one. More reliable predictions and ecological interpretations for river ecosystem 
management would thus be possible provided the best configuration could be found. 

© 2004 Elsevier B.V. All rights reserved. 
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1. Introduction 


* Corresponding author. Tel.: +32-9-264-37-08; 
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Human activities have severely deteriorated the 
Flemish river systems, and many functions such as 
drinking water supply, fishing, etc. are threatened. 
Because the restoration of these river systems entails 
drastic social (e.g. change in habits with regard to 
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water use and discharge, urban planning) and econom¬ 
ical (e.g. investment in nature restoration, wastewater 
treatment system installation) consequences, the de¬ 
cisions should be taken with enough forethought. 
Ecosystem models could therefore act as interesting 
tools to support decision-making in river restoration 
management. In particular models that can predict the 
habitat requirements of organisms are of considerable 
importance to ensure that the planned actions have 
the desired effects on the aquatic ecosystems. It was 
shown that machine learning techniques such as Artifi¬ 
cial Neural Networks (ANNs) basically mimic aspects 
of biological information processing for data model¬ 
ing and could be useful in ecology (Recknagel, 2001). 
The prediction of aquatic communities by means of 
ANN models has recently been discussed by several 
authors (Brosse et al., 1999; Guegan et al., 1998; 
Hoang et al., 2001; Kami et al., 2000; Lae et al., 1999; 
Lee et al., 2003; Lek et al., 1996; Maier and Dandy, 
1997; Maier and Dandy, 2001; Maier et al., 1998; 
Mastrorillo et al., 1997, 1998; Olden and Jackson, 
2002; Park et al., 2001; Recknagel, 1997; Recknagel 
et al., 1997; Reyjol et al., 2001; Scardi, 2001; Scardi 
and Harding, 1999; Schleiter et al., 1999; Wagner 
et al., 2000; Wei et al., 2001; Wilson and Recknagel, 
2001). It is stressed that the ANN architecture is gen¬ 
erally highly problem dependent (Maier and Dandy, 
2000). Lor this reason, it is necessary to develop and 
optimize the ANNs to obtain the best model configu¬ 
ration that gives lower error during training with min¬ 
imal computing time. Traditionally, optimal network 
geometries have been found by trial and error (Brosse 
et al., 1999; Maier and Dandy, 2000). If predictions 
are made for different macroinvertebrate taxa, si¬ 
multaneously another problem could emerge because 
the frequency of occurrence (the number of sites on 
which a taxon was found) could influence the ANN 
architecture. If the optimal ANN architecture could 
be found and reliable predictions would be possible, 
conclusions regarding ANN model design for prac¬ 
tical use in ecological river management could be 
drawn. 

The aim of this paper was to discuss the devel¬ 
opment and optimization of different neural network 
models to obtain the best model configuration for the 
prediction of macroinvertebrate taxa. Two taxa were 
selected: Aplexa, which is a very rare taxon in the 
Zwalm river basin (found at 4.2% of the sites), and 


Asellidae, which was present at 45.4% of the sites. 
The best performing network architecture and training 
algorithms were searched for. Linally, an ecological 
interpretation of the constructed models was made for 
Gammaridae. 


2. Material and methods 

2.1. Study sites and collected data 

The Zwalm river basin which is part of the hydro- 
graphical basin of the Upper-Scheldt (Carchon and 
De Pauw, 1997) was selected as study area (Pig. 1). 
The basin has a total surface of 11,650 ha, the Zwalm 
river itself has a length of 22 km. The river has an 
irregular flow regime, ranging from 0.3 to 4.7 m 3 /s. 
Although Planders is in general a rather flat region, 
the Zwalm river basin is characterized by a number 
of differences in altitude, making it a quite unique 
ecosystem (Soresma, 2000). In the unpolluted head¬ 
waters a sensitive and vulnerable fauna is found (e.g. 
the bullhead (Cottus gobio) the brook lamprey (Lam- 
petra planeri) and the mayfly Heptageniidae). Since 
1999, the water quality in the Zwalm river basin has 
considerably improved due to investments in sewerage 
and wastewater treatment plants during the preceding 
years (VMM, 2000). Several parts of the river are 
however still polluted by untreated urban wastewater 
and by diffuse pollution originating from agricultural 
activities (Goethals and De Pauw, 2001). Numerous 
structural and morphological disturbances still ex¬ 
ist (e.g. weirs for water quantity control, artificial 
embankments, etc.) (Carchon and De Pauw, 1997). 

In total, 60 sites were selected in the Zwalm 
river basin at which physical and chemical sam¬ 
ples were taken (Pig. 1). Observations regarding the 
structural characteristics were made. Each site was 
examined twice over a 2-year period (summer of 
2000 and 2001). In this way, 120 sets of observa¬ 
tions were available. Certain structural characteris¬ 
tics (meandering, substrate type, etc.) were visually 
monitored (Dedecker et al., 2002b). Plow velocity 
was determined by timing the transport of a float 
over a distance of 10 m. A number of flow veloc¬ 
ity measurements at various places in the river (at 
the centre of the stream and at the bank side) were 
taken and the average figure was presented. Control 
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Fig. 1. Location of the Zwalm river basin in Flanders, Belgium. Position of the selected sampling sites in the Zwalm river basin. 


measurements were done by means of a propeller. 
Field measurements were made for temperature and 
dissolved oxygen (OXI 330/SET), pH (Jenway 071) 
and conductivity (WTW LF 90). Suspended solids 
were measured spectrophotometrically in the labo¬ 
ratory (Dedecker et al., 2002b). Macroinvertebrates 
were collected by means of a standard handnet during 
5-min kick sampling within a river stretch of 10 m 
(NBN, 1984) and by in situ exposure of artificial 
substrates (De Pauw et al., 1994). The objective of 
the sampling was to collect the most representative 
diversity of the macroinvertebrates at the examined 
site (De Pauw and Vanhooren, 1983). The struc¬ 
tural characteristics and physico-chemical variables 
(Table 1) were used as inputs for the neural network 
models to predict the presence or absence (respec¬ 
tively represented by 1 and 0) of macroinvertebrate 


taxa in the headwaters and brooks of the Zwalm river 
basin. 

2.2. Data processing 

Because the input variables have very different or¬ 
ders of magnitude it is recommended to rescale the 
data. In this way, more reliable predictions can be 
made. The variables are rescaled to be included within 
the interval [—1,1] by using the following equation: 

y n = 2 x Vo ~ Vmm - l (i) 

Vmax - Emin 

in which V 0 and V n are respectively the old and new 
value of the variable for a sampling point, y m ; n and 
V m ax are the minimum and maximum values of that 
variable in the original dataset. Also the targets are 
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Table 1 

Abiotic input variables and units used in the ANN model 


Variables 

Units 

Temperature 

°C 

pH 


Conductivity 

|xS/cm 

Suspended solids 

mg/l 

Dissolved oxygen 

mg/1 

Water level 

cm 

Fraction of pebbles 

% of river bed 

Shade 

% 

Water plants 

Present/absent 

Width 

cm 

Flow velocity 

m/s 

Meandering 

6 classes (1 = well developed 


to 6 = absent) 

Hollow river beds 

6 classes (1 = well developed 


to 6 = absent) 

Deep/shallow variation 

6 classes (1 = well developed 


to 6 = absent) 

Artificial embankment 

3 classes (0 = absent; 1 

structures 

= moderate; 2 = intensive) 


rescaled over the interval [—1,1] to adapt to the trans¬ 
fer function used (tangential sigmoid) in the output 
layer. In this way, the network will be trained to pro¬ 
duce outputs in the range [—1,1], Afterwards, these 
outputs were converted back into the same units which 


were used for the original targets. The continuous net¬ 
work output is mapped to 0 and 1 using a threshold 
of 0.5. 

2.3. Artificial Neural Networks 

In this study, different neural network models were 
tested and optimized to obtain the best model con¬ 
figuration for the prediction of the habitat suitability 
of macroinvertebrate taxa. The modeling method was 
based on the principles of the backpropagation algo¬ 
rithm (Rumelhart et al., 1986). The construction of 
the ANN model was based on examples of data with 
known outputs. A backpropagation network typically 
comprises three types of neuron layers: an input layer, 
one or more hidden layers and an output layer each 
including one or more neurons. As shown in Fig. 2, 
nodes from one layer are connected to all nodes in 
the following layer, but no lateral connections within 
any layer, nor feed-back connections are possible. 
Fifteen input neuron are used, each representing an 
environmental variable. The output layer comprises 
one neuron, indicating the presence or absence of a 
macroinvertebrate taxon. With the exception of the in¬ 
put neurons, which only connect one input value with 
its associated weight values, the net input for each 


Hidden layer 



Fig. 2. Illustration of a three-layered neural network with one input layer, one hidden layer and one output layer. 
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neuron is the sum of all input values x„, each multi¬ 
plied by its weight wj ni and a bias term z.j which may 
be considered as the weight from a supplementary 
input equalling one: 

«/ = ^2 wjiXi +Zj (2) 

The output value, yj, can be calculated by feeding the 
net input into the transfer function of the neuron: 

yj = Raj) (3) 

Many transfer functions can be used. In this study, 
two types of sigmoid functions have been imple¬ 
mented: the logarithmic (for the hidden layer neu¬ 
rons) and tangential (for the output layer neurons) 
sigmoid transfer function. Layers of neurons with 
non-linear transfer functions allow the network to 
learn non-linear and linear relationships between in¬ 
put and output vectors. Thus they are ideally suited 
for the modeling of ecological data which are often 
known to be non-linear (Lek and Guegan, 1999). 

Before training, the values of the weights and bi¬ 
ases are initially set to small random numbers. Subse¬ 
quently, a set of input/output vector pairs is presented 
to the network. For each input vector, the output vec¬ 
tor is calculated by the neural network model, and an 
error term is calculated for the outputs of all hidden 
and output neurons, by comparing the calculated out¬ 
put vector and the actual output vector. Using this er¬ 
ror term, the weights and biases are updated in order 
to decrease the error, so future outputs are more likely 
to be correct. This procedure is repeated until the er¬ 
rors become small enough or a predefined maximum 
number of iterations is reached. This iterative process 
is termed “training”. After the training, the ANN can 
be validated using independent data. 

In this study, two variations of the basic back- 
propagation algorithm have been compared to train 
the models: the gradient descent algorithm and the 
Levenberg-Marquardt algorithm (Hagan et al., 1996). 
The gradient descent algorithm updates the network 
weights and biases in the direction of the negative of 
the gradient. One iteration of this algorithm can be 
written as 

Xk+i = *k ~ oikgk (4) 

in which Xk is a vector of current weights and biases, 
gk is the current gradient, and u/k is the learning rate. 


The Levenberg-Marquardt algorithm is similar to the 
quasi-Newton method in which a simplified form of 
the Hessian matrix (second derivatives) is used. The 
Hessian matrix can be approximated as 

H = J T J (5) 

and the gradient can be computed as 

g = J T e (6) 

in which J is the Jacobian matrix which contains first 
derivatives of the network errors with respect to the 
weights and biases, and e is a vector of network errors. 
One iteration of this algorithm can be written as 

x k +i = x k - [J T J + J T e (7) 

where /i is the learning rate and I the identity matrix 
(Hagan et al., 1996). During training the learning rate 
jjL is incremented or decremented by a scale at weight 
updates. When // is 0, this is just Newton’s method, us¬ 
ing the approximate Hessian matrix. When // is large, 
this becomes gradient descent with a small step size. 
The Levenberg-Marquardt algorithm was reported 
to have the fastest convergence for neural networks 
that contain up to few hundred neurons (Kami et al., 
2000 ). 

The model validation was based on stratified 10- 
fold cross-validation (Witten and Frank, 2000). For 
10-fold cross-validation the data are split into 10 
folds or partitions. Each fold in turn is used for val¬ 
idation while the rest is used for training. That is, use 
nine-tenths for training and one-tenth for validation, 
and repeat the procedure 10 times so that in the end, 
every instance has been used exactly once for valida¬ 
tion. To allow a reliable error estimate of the models, 
10 stratified 10-fold cross-validation experiments were 
conducted. To compare the performances of the mod¬ 
els trained with the gradient descent algorithm and 
the Levenberg-Marquardt algorithm and the models 
with different architectures, a paired f-test was done 
after checking for normality, to determine whether the 
mean of the set of samples was significantly greater or 
less. A paired f-test could be applied because the same 
splits were used to obtain a matched pair of results. 
For the two-tailed test, a significance level of 5% was 
used. 

The models were evaluated on the basis of two per¬ 
formance measures: the percentage of correctly clas¬ 
sified instances (CCI) and the Cohen’s kappa (at). For 
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Table 2 


The confusion matrix as a basis for the performance measures 
with true positive values (TP), false positives (FP), false negatives 
(FN) and true negative values (TN) 


Predicted 

Actual 



+ 

- 

+ 

TP 

FP 

- 

FN 

TN 


this one requires the derivation of matrices of confu¬ 
sion that identified true positive (TP), false positive 
(FP), false negative (FN) and true negative (TN) cases 
predicted by each model (Fielding and Bell, 1997). In 
that way, observed (actual) presence/absence patterns 
were tabulated against those predicted (Table 2). 

The first performance measure that was calculated 
was the percentage of CCI: 


CCI = 


TP + TN 

TP + FP + FN + TN 


x 100 


( 8 ) 


Another performance measure that was calculated 
was the Cohen’s kappa (Cohen, 1960). It is a sim¬ 
ply derived statistic that measures the proportion of 
all possible cases of presence or absence that are pre¬ 
dicted correctly by a model after accounting for chance 
predictions: 


(TP + TN) - [((TP + FN)(TP + FP) 

+ (FP + TN)(FN + TN))/ n] 

K ~ n- [((TP + FN)(TP + FP) ' '' 

+ (FP + TN)(FN + TN))/n] 

To obtain the best model configuration for the pre¬ 
diction of the habitat suitability of Aplexa and Aselli- 
dae, two taxa with a different frequency of occurrence, 
two training algorithms were compared: the gradient 
descent algorithm and the Levenberg-Marquardt al¬ 
gorithm. For both training algorithms different net¬ 
work architectures were analyzed: five three-layered 
and four four-layered networks with respectively [2], 
[5], [10], [20], [25] and [5 5], [10 5], [10 10], [20 10] 
neurons in the hidden layer(s). The neural network 
models were implemented with the neural network ex¬ 
tension of the software package MATLAB 5.3 for MS 
Windows™. 


3. Results 

3.1. Development and optimization of the ANN 
model configuration 

The percentage of CCI and the Cohen’s kappa 
for Aplexa (Mollusca) are shown in Figs. 3 and 4, 
respectively. Ten 10-fold cross-validations were con¬ 
ducted to obtain a reliable estimate for the perfor¬ 
mance measures. Also a 95% confidence interval of 
the average is shown. The CCI was high for both 
training algorithms. Based on the paired f-test (sig¬ 
nificance level of 5%), the CCI obtained with the 
gradient descent algorithm was not significantly dif¬ 
ferent for the nine model architectures. The CCI was 
between 95.3 and 95.6%. The CCI obtained with the 
Levenberg-Marquardt algorithm was between 89.8 
and 93.6%. When the architecture of the network 
models with the Levenberg-Marquardt algorithm 
becomes more complex, the CCI was significantly 
better, based on the paired f-test (significance level 
of 5%). A paired f-test was performed to compare 
the average CCI over ten 10-fold cross-validations of 
the gradient descent and the Levenberg-Marquardt 
algorithm. This statistical test could be used because 
the same cross-validation splits were used for both 
training algorithms. The results revealed a significant 
difference (significance level of 5%) between the CCI 
of the gradient descent and the Levenberg-Marquardt 
algorithm and that for all nine model architectures 
except for the architecture with 25 neurons in the 
hidden layer. For all the analyzed network archi¬ 
tectures, the gradient descent algorithm resulted in 
a significant higher percentage of CCI. The second 
performance measure that was calculated was the 
Cohen’s kappa, which accounts for the amount of 
chance predictions made by a model. Because the 
gradient descent algorithm predicted Aplexa absent 
at all sites, the Cohen’s kappa remained 0 for all 
network architectures. The ANN models trained with 
the Levenberg-Marquardt algorithm had a Cohen’s 
kappa between —0.02 and 0.01. Although a high CCI 
could be found for all network architectures, their 
Cohen’s kappa indicated that this prediction success 
was merely based on chance. The Cohen’s kappa 
of Aplexa was too low for both training algorithms, 
implying that these models cannot be considered 
relevant. 
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Fig. 3. Comparison of the percentage correctly classified instances for Aplexa (Mollusca) with the gradient descent and the 
Levenberg-Marquardt algorithm in different ANN architectures. 


Contrary to Aplexa , Asellidae (Crustacea) are in¬ 
termediately frequent in the Zwalm river basin. They 
were found at 45.4% of the sites. The CCI and the 
Cohen’s kappa for Asellidae are shown in Figs. 5 
and 6, respectively. The CCI was relatively high for 
both training algorithms. The CCI was between 73.9 
and 76.6% for the gradient descent algorithm, and be¬ 
tween 70.4 and 72.9% for the Levenberg-Marquardt 
algorithm. A Cohen’s kappa between 0.45 and 0.51 
was found for the gradient descent algorithm. The 


Cohen’s kappa for the Levenberg-Marquardt algo¬ 
rithm was between 0.38 and 0.43. Based on Manel 
et al. (2001), a Cohen’s kappa above 0.40 for 
presence/absence models is considered to indicate 
‘moderate’ model performance while lower values in¬ 
dicates a low model performance. ANN models trained 
with the gradient descent algorithm outperformed 
ANN models trained with the Levenberg-Marquardt 
algorithm based on the CCI and the Cohen’s kappa. 
The difference in performance was maximum 3.9% 
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Fig. 4. Comparison of the Cohen’s kappa for Aplexa (Mollusca) with the gradient descent and the Levenberg-Marquardt algorithm in 
different ANN architectures. 
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Number of neurons in the hidden layer(s) 

Fig. 5. Comparison of the percentage correctly classified instances for Asellidae (Crustacea) with the gradient descent and the 
Levenberg-Marquardt algorithm in different ANN architectures. 


(network architecture [10 10] and [20 10]) based on 
the CCI and maximum 0.09 (network architecture 
[5 5]) based on the Cohen’s kappa. Also a paired 
f-test was conducted to compare the average CCI and 
Cohen’s kappa over ten 10-fold cross-validations of 
the gradient descent and the Levenberg-Marquardt 
algorithm. By applying this test, a significant differ¬ 
ence (significance level of 5%) between the CCI of 
the gradient descent and the Levenberg-Marquardt 
algorithm was detected for all network architecture 


except for [2], [20] and [25]. Based on the Cohen’s 
kappa, a significant difference (significance level of 
5%) was found for the network architectures [5], 
[10], [5 5], [10 10] and [20 10]. Based on the CCI 
and the Cohen’s kappa, the best performing network 
was the ANN model trained with the gradient descent 
algorithm with 10 neurons in both hidden layers. 
However, this ANN model was only significantly 
different (significance level of 5%) from the network 
[25], 
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Fig. 6. Comparison of the Cohen’s kappa for Asellidae (Crustacea) with the gradient descent and the Levenberg-Marquardt algorithm in 
different ANN architectures. 
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Dissolved oxygen (mg/I) 

Fig. 7. The impact of dissolved oxygen on the probability of presence of Gammaridae (Crustacea). The predictions have been made for 
nine different network architectures. Also the cumulative curve of the observed presence of Gammaridae in the Zwalm river basin is 
presented. 


3.2. Ecological interpretation of 
induced models 

In ecology, it is useful to know the magnitude of 
impacts of a variable. Therefore, an experimental ap¬ 
proach could be used to determine the response of 
the model to each of the input variables separately 
(Lae et al., 1999; Dedecker et al., 2002a). A range 
of variation of a single independent variable to the 
model is applied while the others are held constant. 
In this way, one is able to determine the impact of 
the variable on the presence or absence of a spe¬ 
cific taxon. In order to limit the computation time, 
the number of points for each curve was limited to 
11, delimiting 10 equal intervals over the variable. 
Different network architectures with the gradient de¬ 
scent algorithm were applied for each variable. Fig. 7 
shows the effect of dissolved oxygen on Gammaridae 
(Crustacea). The networks [20], [25], [10 10] and [20 
10] predicted Gammaridae as always present. The 
networks [2], [5], [10], [5 5] and [10 5] showed an 
increasing relation between dissolved oxygen and the 
predicted probability of presence of Gammaridae. 
Also the cumulative curve of the observed presence 
of Gammaridae in the Zwalm river basin is presented 
in Fig. 7. 


4. Discussion 

Maier and Dandy (2000) mentioned that network 
architecture is generally highly problem dependent. 
A number of systematic approaches for determin¬ 
ing optimal network geometry have been proposed, 
including pruning and constructive algorithms. The 
basic thought of pruning algorithms is to start with 
a network that is large enough to capture the de¬ 
sired input-output relationship and to subsequently 
remove or disable unnecessary weights and/or neu¬ 
rons. Constructive algorithms approach the problem 
of optimizing the number of hidden layer neurons 
from the opposite direction to pruning algorithms. 
The smallest possible network is used at the start. 
Hidden layer neurons and connections are then added 
one at a time in an attempt to improve model per¬ 
formance. Traditionally, however, optimal network 
geometries have been found by trial and error (Brosse 
et al., 1999; Maier and Dandy, 2000). In this paper, 
trial and error is used to optimize the neural network 
architecture. Evaluating the predictive model perfor¬ 
mance frequently involves determining the percentage 
of sites for which presence or absence of organisms 
is correctly predicted (Manel et al., 2001). There is 
clear evidence though, that the CCI is influenced by 
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the frequency of occurrence of the organism being 
modeled (Dedecker et al., 2002a; Fielding and Bell, 
1997; Manel et al., 1999). The problem with rare taxa 
is that there is little information to allow the neural 
network model to learn when these taxa are present. 
In this way the models tend to “learn” that very rare 
taxa are always absent. The same difficulty occurs 
with very common taxa. Here the models “learn” that 
very common taxa are always present. This problem 
is illustrated in the present study predicting the pres¬ 
ence/absence of Aplexa. Aplexa was found at only 
4.2% of the sites making it a rare taxon in the Zwalm 
river basin. The CCI was high for both training algo¬ 
rithms, namely between 95.3 and 95.6% for the gra¬ 
dient descent algorithm and between 89.8 and 93.6% 
for the Levenberg-Marquardt algorithm. As stressed 
by Manel et al. (2001) it is important to look at the 
predictions of the sites where the rare taxa are present 
and the common taxa are absent. Otherwise the eval¬ 
uation of these models could be misleading. For this 
reason, it was decided to integrate an additional per¬ 
formance measure to assess the models, namely the 
Cohen’s kappa (Cohen, 1960). The combination of 
CCI with Cohen’s kappa, a measure of the proportion 
of all possible cases of presence or absence that are 
predicted correctly after accounting for chance ef¬ 
fects, allowed a better interpretation of the predictive 
performance of the models (D’heygere et al., 2004). 
The ANN models trained with the gradient descent al¬ 
gorithm predicted Aplexa as always present, resulting 
in a Cohen’s kappa of 0 for all network architectures. 
The models trained with the Levenberg-Marquardt 
algorithm were able to predict Aplexa as present in 
some cases. However, the network also predicted 
Aplexa as present at sites where they were not found, 
resulting in a Cohen’s kappa between —0.02 and 0.01. 
Based on the Cohen’s kappa, it could be concluded 
that the produced models for Aplexa were irrelevant, 
although their CCI was high. For the very common 
taxa, similar conclusions could be drawn. Contrary to 
Aplexa , good model performances were obtained for 
Asellidae which were found at 45.4% of the sites. The 
CCI was relatively high for both training algorithms. 
However, based on the CCI the ANN models trained 
with the gradient descent algorithm outperformed the 
models trained with the Levenberg-Marquardt algo¬ 
rithm for all network architectures. Also the Cohen’s 
kappa was higher for all network architectures 


trained with the gradient descent algorithm. Based 
on Manel et al. (2001), a Cohen’s kappa above 0.40 
for presence/absence models is considered to indicate 
‘moderate’ model performance while lower values 
indicate a low model performance. In this way, the 
Cohen’s kappas obtained with the gradient descent al¬ 
gorithm can be classified as ‘moderate’ performances, 
while some of the Cohen’s kappas obtained with the 
Levenberg-Marquardt algorithm indicate a low model 
performance. Based on these two performance mea¬ 
sures, a neural network model trained with the gradi¬ 
ent descent algorithm is preferred. Another drawback 
of the Levenberg-Marquardt algorithm in compari¬ 
son with the gradient descent algorithm was the long 
calculation time for the more complex network archi¬ 
tectures by its demand for memory to operate with 
large Jacobians and a necessity of inverting large ma¬ 
trices. The rank of matrices to be inverted is equal to 
the number of weights in the system. Such large ma¬ 
trices must be inverted at each iteration step and this 
results in large computation time. The highest perfor¬ 
mances (CCI and Cohen’s kappa) were found for the 
network model with two hidden layers each having 
10 neurons. In this way, this is the most appropriate 
model to predict the presence/absence of Asellidae. 
Although, when calculation time is also taken into 
account, the network model with one hidden layer 
having 10 neurons could be preferred. Applying this 
network architecture, performances were only slightly 
worse, while calculation time was a lot shorter. As 
demonstrated, the predictive ability of a given net¬ 
work architecture and training algorithm depends on 
the frequency of presence of a given macroinverte¬ 
brate taxon. If in another case, the frequency of a 
taxon is unknown a priori, the choice of a suitable 
model can be based on data from other studies which 
discuss similar systems. These comparable systems 
could give an indication whether a macroinvertebrate 
taxon is expected to be very rare, very common or 
rather moderately present. If these studies point out, 
a macroinvertebrate taxon is expected to be moder¬ 
ately present, predictions could be based on ANN 
models, because good results could be obtained as 
shown in this study. On the other hand, if a taxon is 
expected to be very rare or very common, predictions 
could be based on expert knowledge based Fuzzy 
Logic models, in which external expert knowledge 
can be incorporated, because ANN predictions seems 
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to be rather irrelevant in these cases based on this 
study. 

Further optimization of the ANN models can be ob¬ 
tained by the selection of more appropriate input vari¬ 
ables using, e.g. genetic algorithms (D’heygere et al., 
2002, 2004). The variables that are not selected can 
be seen as irrelevant for a particular taxon (Witten 
and Frank, 2000). In ANN models, the irrelevant in¬ 
formation is also sent through the nodes and can as 
such slightly alter the connection weights and affect 
the overall performance of ANNs (D’heygere et al., 
2004). In this study, a set of parameters including 
learning rate, momentum and threshold value is held 
constant. In the future, genetic algorithms will also be 
used to automatically calibrate these parameters of the 
network (D’heygere et al., 2004). 

In many studies ANN models have been shown to 
reveal superior predictive power compared to tradi¬ 
tional approaches, e.g. multiple regression (Lek et al., 
1996). Although, a disadvantage of ANN models in 
comparison with conventional models is their lack of 
explanations regarding the relative importance of each 
independent variable considered. In this way, ANN 
models have been labeled a ‘black box’. This lack 
of illustrative power is a major concern to ecologists 
since the interpretation of statistical models is desir¬ 
able for gaining knowledge of the causal relationships 
driving ecological phenomena (Olden and Jackson, 
2002). Olden and Jackson (2002) describe a number 
of methods for understanding the mechanics of ANN 
models. They propose a randomization test for ANN 
models, which provides a statistical pruning tech¬ 
nique for eliminating null connection weights that 
minimally influence the predicted output, as well as 
provides a selection method for identifying indepen¬ 
dent variables that significantly contribute to network 
predictions. Ozesmi and Ozesmi (1999) proposed the 
neural interpretation diagram (NID) for providing a 
visual interpretation of the connection weights among 
neurons, where the relative magnitude of each connec¬ 
tion weight is represented by line thickness and line 
shading represents the direction of the weight. Garson 
(1991) proposed a method for partitioning the neural 
network connection weights in order to determine 
the relative importance of each input variable in the 
network. A number of investigators have used sensi¬ 
tivity analysis (Dedecker et al., 2002a; Guegan et al., 
1998; Lae et al., 1999; Lek et al., 1996; Mastrorillo 


et al., 1998), varying the input variable across its entire 
range while holding all other input variables constant, 
so that the individual contributions of each variable are 
assessed. In this work a sensitivity analysis has been 
performed. As mentioned in the literature, Gammari- 
dae prefer relatively high levels of dissolved oxygen. 
This relationship can also be derived from the in¬ 
duced models (Fig. 7). However, not all the networks 
gave this relationship. The most complex networks, 
networks [20], [25], [10 10] and [20 10], predicted 
Gammaridae as always present, which is ecologically 
inappropriate. The presence of Gammaridae in the 
Zwalm river basin was analyzed composing the cu¬ 
mulative curve of the observed values. Comparing 
this curve with the predicted probability of presence 
of Gammaridae, the best approach is given by the 
network with two hidden layers [5 5], However, when 
the best network model was used, sensitivity analysis 
provided useful insight in the habitat preference of 
that taxon, which means important information for 
river ecosystem management. Lae et al. (1999) for 
example illustrated the influence of six independent 
environmental variables on the fish yield in the ANN 
modeling. For most variables the authors found eco¬ 
logically relevant relations. This is in contrast to this 
research where the relations between some variables 
and the presence/absence of the macroinvertebrate 
taxa were difficult to interpret, although the predictive 
performance of the ANN models was in general good. 

5. Conclusions 

Artificial Neural Network models are efficient tools 
to predict the occurrence of macroinvertebrate taxa 
based on the abiotic characteristics of their aquatic en¬ 
vironment. Several authors proved that ANN models 
are good alternatives for traditional approaches such 
as multiple regression (Lek et al., 1996). As mentioned 
before, the network structure to be used is very prob¬ 
lem dependent. The results of this research indicate 
also that the frequency of occurrence of a taxon in 
the whole river basin influences the performance mea¬ 
sures and the architecture of the network. Based on 
the Cohen’s kappa, it could be concluded that ANN 
models predicting the presence/absence of very rare 
taxa (e.g. Aplexa) or very common taxa (e.g. Tubifici- 
dae) are rather irrelevant, although their CCI is high. 
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Predicting the presence/absence of Asellidae (a mod¬ 
erately present taxon), the highest performances (CCI 
and Cohen’s kappa) were found for the network model 
with two hidden layers each having 10 neurons. When 
calculation time is also taken into account, the net¬ 
work model with one hidden layer having 10 neurons 
can be preferred. Applying this network architecture, 
performances are only slightly worse, while calcula¬ 
tion time is a lot shorter. One might also conclude that 
not all network models are capable of finding a rele¬ 
vant relation between a variable and a specific taxon. 
For the Gammaridae, for example, a rather small net¬ 
work structure gave a better idea of the impact of dis¬ 
solved oxygen than a larger one. The challenge will 
be to build the best model configuration, if more re¬ 
liable predictions are to be expected. This is essen¬ 
tial for a correct ecological interpretation, needed for 
ecosystem management. 
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